Self-Training for Biomedical Parsing
نویسندگان
چکیده
Parser self-training is the technique of taking an existing parser, parsing extra data and then creating a second parser by treating the extra data as further training data. Here we apply this technique to parser adaptation. In particular, we self-train the standard Charniak/Johnson Penn-Treebank parser using unlabeled biomedical abstracts. This achieves an f -score of 84.3% on a standard test set of biomedical abstracts from the Genia corpus. This is a 20% error reduction over the best previous result on biomedical data (80.2% on the same test set).
منابع مشابه
A Word Clustering Approach to Domain Adaptation: Effective Parsing of Biomedical Texts
We present a simple and effective way to perform out-of-domain statistical parsing by drastically reducing lexical data sparseness in a PCFG-LA architecture. We replace terminal symbols with unsupervised word clusters acquired from a large newspaper corpus augmented with biomedical targetdomain data. The resulting clusters are effective in bridging the lexical gap between source-domain and targ...
متن کاملTreeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking
We describe “treeblazing”, a method of using annotations from the GENIA treebank to constrain a parse forest from an HPSG parser. Combining this with self-training, we show significant dependency score improvements in a task of adaptation to the biomedical domain, reducing error rate by 9% compared to out-of-domain gold data and 6% compared to self-training. We also demonstrate improvements in ...
متن کاملWeakly supervised training for parsing Mandarin broadcast transcripts
We present a systematic investigation of applying weakly supervised co-training approaches to improve parsing performance for parsing Mandarin broadcast news (BN) and broadcast conversation (BC) transcripts, by iteratively retraining two competitive Chinese parsers from a small set of treebanked data and a large set of unlabeled data. We compare co-training to self-training, and our results sho...
متن کاملExploring Self-training and Co-training for Dependency Parsing
We explore the effect of self-training and co-training on Hindi dependency parsing. We use Malt parser, which is a state-ofthe-art Hindi dependency parser, and apply self-training using a large unannotated corpus. For co-training, we use MST parser with comparable accuracy to the Malt parser. Experiments are performed using two types of raw corpora— one from the same domain as the test data and...
متن کاملFaster Parsing by Supertagger Adaptation
We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highestscoring derivation. Since the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008